Interactive document reading

ABSTRACT

A system for assisting a user in extracting information from a document set including at least one original document 2 having content comprises: a pen  201  arranged to make pen strokes on a representation of the document  2 , a recording system  232 238 242  arranged to record the position of the pen strokes on the representation, and a processor  210, 212,  arranged to interpret the pen strokes as identifying selected parts of the content on the document  2  and to produce a reference document relating to the document set, the content of the reference document being dependent on the selected content.

FIELD OF THE INVENTION

The present invention relates to the extraction of information fromdocuments, and in particular to systems that aid a user to extractinformation from documents that they are reading.

BACKGROUND TO THE INVENTION

It has been demonstrated that when a reader reads a document they takein the information in the document more effectively if they readinteractively. This includes marking the document as it is read, forexample by underlining relevant words or passages or highlighting themin other ways. This also means that the marked document, when referredto again, will be easier to read as the words or passages of interestwill be highlighted.

SUMMARY OF THE INVENTION

The present invention therefore provides a system for assisting a userin extracting information from a document set including at least oneoriginal document having content, the system comprising: a pen arrangedto be moved over a representation of the original document to define penstrokes, a recording system arranged to record the position of the penstrokes on the representation, and a processor arranged to interpret thepen strokes as identifying selected parts of the content and to producea reference document relating to the document set, the content of thereference document being dependent on the selected content.

The processor can comprise any suitable processing system, and maycomprise a number of processing units arranged to operate together toprocess the pen strokes. The pen may be arranged to mark therepresentation of the original document, or it may comprise a simplepointing device such as a stylus. It may comprise part of a more complexsystem, for example being a light pen.

The content can be in any of a number of forms. For example, it maycomprise text, images, drawings, or tables of figures or symbols.

The reference document may be human readable, either directly or bybeing representable or reproducible in a human readable form. Forexample the reference document may be an electronic document that can bedisplayed on screen or printed, or it may be a hard copy document.

The representation may comprise a hard copy of the document, or it maycomprise a display of the document, for example on a display screen.

The reference document may include a copy of the document set withadditional content, or links to additional content, or an index orsummary added to aid re-reading of the document. Alternatively it maycomprise a separate document, such as a summary or index of the originaldocument set.

The processor may be arranged to search for other documents using asearch strategy determined by the selected content, and to include theother documents in the set. In this case the reference document maysimply identify the documents in the set, or it may include anindication of the relevance of at least one of the documents in the set.

The system may be arranged for use by a single user, or it may bearranged to identify a plurality of users, and to produce one referencedocument for each of the users, using pen strokes made by the respectiveuser. The system may be arranged to identify each user on the basis ofthe identity of the pen that made the pen strokes, or by other methodssuch as the use of user names.

The present invention further provides a system for extractinginformation from a document set including at least one original documenthaving content, the system comprising: a position determining meansarranged to receive data defining the position of pen strokes made on arepresentation of the document by a pen, and processing means arrangedto interpret the pen strokes as identifying selected parts of the texton the document and to produce a reference document relating to thedocument set, the content of the reference document being dependent onthe selected content.

-   26.

The present invention further provides a system for assisting a user inextracting information from an original document having content, thesystem comprising a manually operable selecting device, operable inconjunction with a representation of the original document to selectportions of the content, and a processor arranged to produce a referencedocument relating to the original document, the content of the referencedocument being dependent on the selected portions.

The selecting device may be a hand held device. It may also be arrangedto be placed in contact with, or close to, the representation in orderto select the content. In this case the selecting device may be arrangedeither to make marks on the representation or simply to move over it.Alternatively the selecting device may be arranged to interact with therepresentation in some other way, for example by directing a light beamat the representation such that the light beam can be detected. Wherethe representation is a display, for example on a display screen, theselecting device may be arranged to operate by moving a cursor or otherhighlighting or selecting device on the screen.

The present invention further provides corresponding methods, and also adata carrier carrying data arranged to control relevant systems tooperate as a system according to the invention and to perform themethods of the invention. The data carrier can comprise, for example, afloppy disk, a CDROM, a DVD ROM/RAM (including +RW, −RW), a hard drive,a non-volatile memory, any form of magneto optical disk, a wire, atransmitted signal (which may comprise an internet download, an ftptransfer, or the like), or any other form of computer readable medium.

Preferred embodiments of the present invention will now be described byway of example only with reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a document having content and a position identifyingpattern on it;

FIG. 2 is a schematic view of a system according to a first embodimentof the invention for use with the document of FIG. 1;

FIG. 3 is a diagrammatic view of some of the functional components ofthe system of FIG. 2;

FIG. 4 is a flow diagram showing a first method of operation of thesystem of FIG. 2;

FIG. 5 shows the system of FIG. 2 connected to the internet;

FIG. 6 is a flow diagram showing another method of operation of thesystem of FIG. 2;

FIG. 7 is a flow diagram showing another method of operation of thesystem of FIG. 2;

FIG. 8 is a schematic view of a system according to a second embodimentof the invention; and

FIG. 9 is a diagrammatic view of some of the functional components ofthe system of FIG. 8.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

Referring to FIG. 1, systems of the present invention can be arrangedfor use with documents 2 that have written content 4 and a positionidentifying pattern 6 thereon. The written content 4 can be in any form,and could, for example, comprise a newspaper or journal article, a novelor short story, an agenda for a meeting, or an index or list. Thecontent is printed onto the document by any suitable process. Theposition identifying pattern covers the whole of the document 2,although only a small area of it is shown in FIG. 1. The positionidentifying pattern is made up of a number of graphical elementscomprising black ink dots 8 arranged on an imaginary grid 10. The grid10, which is shown in FIG. 1 for clarity but is not actually marked onthe document 2, can be considered as being made up of horizontal andvertical lines 12, 14 defining a number of intersections 16 where theycross. The intersections 16 are of the order of 0.3 mm apart, and thedots 8 are of the order of 100 μm across. One dot 8 is provided at eachintersection 16, but offset slightly in one of four possible directionsup, down, left or right, from the actual intersection 16. The dotoffsets are arranged to vary in a systematic way so that any group of asufficient number of dots 8, for example any group of 36 dots arrangedin a six by six square, will be unique within a very large area of thepattern. This large area is defined as a total imaginary pattern space,and only a small part of the pattern space is taken up by the pattern onthe document 2. An example of this type of pattern is described in WO01/26033.

The position identifying pattern 6 can be detected by a sensing systemmounted in a pen, as will be described below, so that the position ofmarks made on the document 2 by the pen can be detected.

Referring to FIGS. 2 and 3 a system according to an embodiment of theinvention, for interactive reading of documents 2 having the positionidentifying pattern 6 on them, comprises a personal computer (PC) 200, apen 201, and a printer 202. The PC 200 has a screen 204, a keyboard 206and a mouse 208 connected to it to provide a user interface 209 as showngenerally in FIG. 3. As also shown in FIG. 3, the PC 200 comprises aprocessor 210 and a pattern allocation module 212 which is a softwaremodule stored in memory. The pattern allocation module 212 includes thedefinition of the total area of pattern space and a record of whichparts of that total area have been allocated to specific documents, forexample by means of coordinate references. The PC 200 further comprisesa printer driver 214, which is a further software module, and a memory216 having electronic documents 218 stored in it. The electronicdocuments can have been produced in any suitable manner. For examplethey may have been generated on the PC using a word processing ordrawing package, or they may have been produced by scanning hard copydocuments. Alternatively they may have been downloaded to the PC fromanother source, such as a disc, a local network server, or the internet.The user interface 209 allows a user to interact with the PC 200.

In order to produce the printed documents 2 the processor 210 retrievesan electronic document 218 from the memory 216 and sends it to theprinter driver 214. The printer driver 214 allocates a unique documentidentification code to the document to be printed and requests therequired pattern area from the pattern allocation module 212, whichcommunicates the details of the pattern including the positions of allthe required dots, back to the printer driver 214. The printer driver214 then adds the pattern 6 to the electronic document to form an imagewhich includes the pattern 6 and the content 4, converts the documentincluding the pattern 6 to a format suitable for the printer 202, andsends it to the printer 202 which prints the document 2 including thepattern area 6. The exact position of the text on the printed documentcan change each time the document is printed out. The pattern allocationmodule 212 therefore stores details of each printed instance of thedocument including the position on the printed document of all of thecontent features of the document.

In practice the various components of the system can be spread out overa local network or the internet. For example the pattern allocationmodule 212 can be provided on a separate internet connected server sothat it can be accessed by a number of users.

Referring still to FIG. 2, the pen 201 for reading the pattern 6comprises a writing nib 230, and a camera 232 made up of an infra red(IR) LED 234 and an IR sensor 236. The camera 232 is arranged to image acircular area adjacent to the tip 231 of the pen nib 230. A processor238 processes images from the camera 232 taken at a predetermined rapidsample rate. A pressure sensor 240 detects when the nib 230 is incontact with the document 2 and triggers operation of the camera 232.Whenever the pen 201 is being used on a patterned area of the document2, the processor 238 can therefore determine from the pattern 6 theposition on the document 2 over which the pen 201 is being passed. Thesequence of positions is saved in the pen's memory 242 as pen strokedata, and can be transmitted to the PC 200 via a radio frequencytransmitter 244 in the pen 201. Suitable pens are available fromLogitech under the trade mark Logitech Io.

Referring back to FIG. 3, the PC 200 further comprises a radio frequencyreceiver 220 and an input/output module 222 which processes the signalsreceived by the receiver 220 and inputs them to the processor 210. Italso includes a pen stroke interpretation module 224 which is arrangedto interpret the pen stroke data from the pen 300 and an application 226which uses the pen stroke data to perform various functions related tothe documents 2.

In use, a user creates one or more documents 218 in electronic formusing the application 226, which will be stored in the PC's memory 216.These electronic documents 218 include definitions of written content 4,and may also comprise definitions of other forms of content such asdrawings. The documents 218 can be displayed on the screen 204 of the PCand read directly from the screen. However, in this case, a hard copy ofthe document 2 is printed together with the position identifying pattern6 as described above. When printing, the printer driver 214 identifiesthe layout of the printed document, and communicates that layoutinformation to the pattern allocation module 212.

As the user reads the document 2, he can mark it in various ways usingthe tip of the pen nib 231 to select or highlight various parts of thetext. These might be individual words, passages, sentences, paragraphsor sections. In the example shown in FIG. 1, the first sentence of thefirst paragraph is selected by a single underline 20. The last word ofthe first paragraph, in this case “paragraphs”, is selected by a ring 22around the word. The whole of the second paragraph is selected by meansof a mark 24 in the margin 26, which in this case is a double lineextending vertically down the side of the paragraph. Finally the singleword “different” is selected from the second paragraph by a singleunderline 28.

It will be appreciated that the pen strokes can be made in a number ofdifferent ways depending on the nature of the pen. For example the pencould be arranged to act as a highlighter pen so that simply passing itover a word or part of the content would select that word or part.

As the marks 20, 22, 24, 26 are made, the pen 201 identifies theposition and shape of the marks in pattern space and records thisinformation as pen stroke data. When the document 2 has been read andmarked by the user, the pen 201 is arranged to transmit the pen strokedata defining the marks 20, 22, 24, 26 to the PC. The transmitting ofthe data can be initiated in a number of ways, for example by marking aspecific area of the document 2 that can be recognised by the pen 201 asa ‘send box’ causing the transmission of the data, or by making a markof a particular shape, that is recognized by the pen as an instructionto transmit the data.

When the PC receives the pen stroke data, the pattern allocation moduledetermines from the position in pattern space of the marks 20, 22, 24,26, which document they have been made on, in this case the document 2,and the position on that document in which the marks have been made. Theapplication 226 then retrieves the electronic copy 218 of the document 2from the memory 210, and the definition stored in the pattern allocationmodule 212 of the printed document. This definition includes datadefining all of the text and other content on the document and itsposition on the document. By combining the content data and the penstroke data, the application 226 can determine which words, phrases,sentences, paragraphs or passages, or which drawings, diagrams ortables, of the document 2 have been highlighted, and in what manner.

When the highlighted content of the document has been identified, theapplication 226 can use this information in a number of ways, which canbe selected by the user from a suitable menu. One option is for theapplication 226 to produce a modified electronic version of the document2 in which the selected content is highlighted. The highlighting can beselected to correspond to the marks made on the original document 2,being made up of lines underlining, circling, or marking in the marginthe selected text or drawings. Alternatively the highlighting can beselected to take a different form. For example highlighted text can beconverted to a different font, having a different font size, beingunderlined or in bold, having a different colour, and highlighteddrawings or diagrams can be shrunk or simplified. This modified documentcan then be saved and either viewed on the screen 204 of the PC 200, orprinted again for re-reading.

Another option that can be selected is for a summary of the document 2to be produced, taking into account the selected content. Referring toFIG. 4, in the automatic summarising process the application 226 firstidentifies text to be summarised at step 401. This can be done on thebasis of user inputs to the PC 200 via the user interface, or on thebasis of predetermined rules. In this example the summary is of thewhole of the document 2. Then at step 402 the processor identifies wordsand phrases in the document and gives them a weighting based on a numberof factors including the number of times they occur. The weighting givento each part of the text is then modified at step 403, as describedbelow, to take into account the selected text. On the basis of theweightings of words and phrases the application identifies sentencesthat best summarise the whole document, and uses them to produce thesummary at step 404.

In the modification to the weightings, any word, phrase or sentence thathas been selected is given a higher weighting in the summarisingprocess, so that it is more likely to appear in the summary. Where awhole paragraph is selected, then each sentence and each word in it isgiven a higher weighting. Where a sentence or phrase is selected, theweighting of both the whole of that sentence or phrase and of each wordin it is increased.

Where a single word is selected its weighting is increased by a greaterfactor than if it just part of a selected phrase or sentence. Theweighting accorded to each word, sentence or paragraph is also dependenton the manner in which it has been selected by the pen 201. For example,where a word is circled it is given a higher weighting than if it isonly underlined, and a double underlining or a double line in the marginresults in a higher weighting than a corresponding single mark. When thesummary has been produced, it can either be saved as a separatedocument, with or without links to the original document or appended tothe original document, with or without navigation links back to theoriginal position of the selected text.

The content can include features other than text, and the summary mayalso include copies of, or simplified or modified versions of, selecteddrawings, diagrams or tables, or any other selected content. Forexample, the original document may contain drawings of a large number ofitems, for example in the form of a catalogue, together with the name ofeach item and a description of each item. In this case, if the title ora part of the description is selected, then the drawing, either alone orwith the title or part of the description, can be incorporated into thesummary. Alternatively if the drawing is selected, then part of thedescription or the title, either with or without the drawing, can be inincorporated into the summary. Another example of an original documentincluding drawings is a technical description that includes graphs,drawings and tables. In this case, where the reference document includesa summary of a section of the description then it can be arranged alsoto include any graphs, drawings or tables associated with that section.

A further option that can be selected is the production of a modifieddocument in which definitions or translations of the selected terms areadded to the document. In this case the PC 200 needs access to suitabledictionaries, either single language dictionaries giving definitions ofwords in the language in which the document is written, or foreignlanguage dictionaries giving translations from the language of thedocument 2 into another language. These dictionaries may be available onthe PC 200 or a local network, but in this example, as shown in FIG. 5,the PC is internet connected and the dictionaries 250, 252 are accessedover the internet 254. If the user requests same-language definitionsfor selected words or phrases, then the application accesses the samelanguage dictionary 251 via the internet and obtains suitabledefinitions. These definitions can be inserted into the electronicdocument 218, for example in the form of footnotes or in parenthesesafter the selected terms. This is particularly suitable if the documentis to be printed out again for re-reading. Alternatively links to therelevant definitions can be associated with each of the words in theelectronic document 218, so that the definitions can be accessed whenviewing the document on screen. If the user selects translation of theselected terms, then the foreign language dictionary 252 is accessed,and suitable translations obtained and treated in the same manner as thesame language definitions described.

Another option that can be selected is the creation of an index to theselected terms. In this case, referring to FIG. 6, the application 226identifies at step 601 each of the selected terms, and determines atstep 602 the page in the document 2 on which it occurs, as well as theline of the page on which it occurs. It then produces at step 603 anindex list of the selected terms, and adds, at step 604, an indicationof the page number and line number at which it occurs. As well as thespecific occurrence selected by the user, the application 226 isarranged to identify at step 605 all occurrences of the same termthroughout the document 218 and identify them all in the index byrecording their page and line numbers at page 606. The index list alsoincludes, for each term, a link to the selected term in the positionwhere it was selected in the original document, added at step 607.

The indexed terms are then ordered in the required manner at step 608,for example alphabetically to form the final index. This index caneither be appended to the original document 218 or saved as a separatedocument.

Another option that can be selected is for the selected text to beinterpreted as defining a purchase list indicating parts of the document218 that the user would like to purchase one or more electronic copiesof. This is particularly relevant where a user can obtain hard copies ofa document free of charge, but can only obtain electronic copies forpayment. The selected text can be identified, for example, byhighlighting one or more headings which selects the sections or chaptersunder the headings. Alternatively the selected text can be identified bysimply marking in the margin the required text. In either case theordering can be completed by making payment to the owner of the documentand downloading the required electronic copy.

A further option that can be selected is based on the indexing processdescribed above, but is extended to form an information summary coveringmany documents that the user has read and marked with the pen 201. Inthis case the summary also acts as an aid to the retrieval ofinformation from all the documents that have been read. As the index isbuilt up it includes not only the page and line references of theselected text, but also the identity of the document in which it wasselected. The summarising function described above is also included inthis option, so that the index includes, for some of the indexed termsselected by the user, a summary of the passage in which they originallyoccurred. The extent of the passage that is summarised can also beselected by the user, for example using a line in the margin similar tothe line 24 in FIG. 1. Where the user does not define the passage to besummarised, the application 226 selects a default amount of text, inthis case the paragraph in which the selected term occurred.

An extension to the multiple document summary described above is alsoprovided whereby the summary is extended to cover not only documentsthat the user has read, but also documents that they have not read.Referring to FIG. 7, in this case the application 226 has to define atstep 701 an identified set of documents that are to be included. In thiscase the set comprises all of the documents in the memory of the PC thatthe application can access. However, it can include all documents on alocal network, all documents on the internet, or one or more groups ofdocuments from the network or internet. Then at step 702 a basic indexis created using the steps of FIG. 5. When the index has been createdfor the document that has been read, the application carries out asearch among all of the documents in the set, to identify any othersthat contain information relevant to the index term. The first step 703of the search is simply to identify other documents that contain theindex term. A further step 704 is to give a relevance weighting to thosedocuments based on the number of times the term occurs, or thesimilarity of the context in which it occurs, which can be determined bycomparing words around the indexed term. When the further documents havebeen identified, a textual reference to them, a link to them, and asummary of them are added to the index at steps 705,706 and 707respectively.

It will be appreciated that in the example just described, the index orsummary serves not only as an index but also as a summary of documentsread by the user and as a search tool to enable the user to find andread further documents that may be of interest. A further option whichis available is for the application 226 to carry out an advanced searchfunction. If the advanced search is selected, the search is carried outnot on each selected term individually, but on a combination of a numberof selected terms. In this case the documents identified by the searchare listed in a search results document and ranked in order of thenumber of the selected terms that occurs in them. A summary of each ofthe selected documents, or passages from them, can also be included inthe search results document.

Referring to FIG. 8 in a second embodiment of the invention a number ofPCs 300 are networked together on a local area network (LAN) with anetwork server 303, and a printer 302. A number of pens 301 areprovided, each of which has its own unique identity number. Referring toFIG. 9 the server 303 includes all of the functional units of the PC ofFIG. 2, which are indicated by the same reference numerals increased by100. The network is set up for use by a number of users, each of whomhas their own user name which is stored on the network server 300. Theserver 303 is provided with an internet connection. When a user logsonto the network using one of the PCs 300 they input their user ID sothat the server can associate all actions that they take with their userID. A user can access documents 318 stored on the server 303, and otherdocuments stored elsewhere via the internet. Each user can print offhard copies of the documents 318 with position identifying pattern onthem and read them, marking them with one of the pens 301.

For each user, as identified by the user ID or by the pen 301 that theyuse, the server 303 can provide a summary, index, or searching facilityas in the first embodiment of the invention described above. However,the server 303 is also arranged to produce similar summaries, indexesand search facilities jointly for groups of two or more of the users, orindeed all of the users. For example, where all of the users are workingon a joint project and therefore reading documents relating to thatproject, a single index is built up based on the pen strokes recorded byall of the users. As described above, this index can include a list ofrelevant terms, summaries of passages and documents read, and lists andsummaries of further documents that have not been read but that might berelevant or of interest.

It will be appreciated that the different users can be identified in anumber of different ways for example using writing style analysis orusing a biometric identification system linked to the network, such as afingerprint or iris recognition system.

A further option that is available in the multiple-user system, is forthe pen stroke data from all of the users to be combined to form arecord, stored on the server 303, of which documents, and which parts ofwhich documents, have been read by which users, and at what times. Thisdata can be combined to produce a summary of the levels of readerinterest in each of the documents, indicating for example which are thedocuments of most interest, which are the documents of least interest,and which groups of readers have shown the most and least interest inany particular document or group of documents. This summary acts as anaid to the users to help them identify the most relevant documents andto extract the most relevant information from those documents.

It will be appreciated that, in the embodiments described above, theposition of the pen strokes on the printed copy of the document can bedetermined in any of a number of ways. For example the printed documentcan be placed on a detection system that is arranged to track movementsof the pen relative to sensors within the detection system, such asinfra-red or magnetic sensors.

In a further modification to the embodiments described above, thedocument is not printed out at all, but is viewed on a screen, and thepen is replaced by a light pen. The light pen includes a photo sensor,and when it is held at a point on the cathode ray tube (CRT) screen, itdetects when light is emitted from that point. This information istransmitted to the CRT controller, which controls the position of theCRT electron beam and hence can determine when light will be emittedfrom each point on the screen.

This enables the CRT controller to determine the position of the pen onthe screen. This system therefore enables the user to read the documenton screen and make pen strokes on the screen using the light pen. Thesepen strokes are then interpreted in the same way as the pen strokes inthe embodiments described above, using data in the CRT controller thatindicates the position on the screen of the content features of thedocument. In such a system the pen, as that in the previous embodiments,has a tip that can be brought into contact with the representation ofthe document, and moved over the representation of the document to makethe pen strokes. This allows the user to interact closely and directlywith the document, in a manner that is familiar to users of conventionalpen and paper.

In a further modification, the document is displayed on a tablet PC orother device having a touch sensitive screen. In this case the pencomprises a simple pointer or stylus that can be brought into contactwith, and moved across the surface of, the touch sensitive screen, tomake the pen strokes. The pen stroke data is then captured by the touchsensitive screen and processed as in the previous embodiments.

1. A system for assisting a user in extracting information from adocument set including at least one original document having content,the system comprising: a pen arranged to be moved over a representationof the original document to define pen strokes, a recording systemarranged to record the position of the pen strokes on therepresentation, and a processor arranged to interpret the pen strokes asidentifying selected parts of the content and to produce a referencedocument relating to the document set, the content of the referencedocument being dependent on the selected content.
 2. A system accordingto claim 1 wherein the reference document includes a copy of thedocument set with additional content added to aid re-reading of thedocument.
 3. A system according to claim 1 or claim 2 wherein thereference document includes a copy of the document set with a link toadditional content to aid re-reading of the document.
 4. A systemaccording to claim 2 or claim 3 wherein the additional content isarranged to highlight at least part of the selected content.
 5. A systemaccording to any of claims 2 to 4 wherein the content comprises text andthe additional content includes a translation of at least one part ofthe selected text.
 6. A system according to any of claims 2 to 4 whereinthe content comprises text and the additional content includes adefinition of at least one part of the selected text.
 7. A systemaccording to any foregoing claim wherein the reference document includesan index to the selected content.
 8. A system according to claim 7wherein the index identifies the position of the selected content in thedocument set.
 9. A system according to claim 8 wherein the contentcomprises text and the selected text is used to form indexed terms inthe index.
 10. A system according to any of claims 7 to 9 wherein theindex includes a link to the original position in the document set of atleast a part of the selected content.
 11. A system according to anyforegoing claim wherein the reference document includes a summary of atleast a part of the document set.
 12. A system according to claim 10wherein the processor is arranged to prepare the summary taking intoaccount the selected text.
 13. A system according to claim 11 whereinthe processor is arranged to interpret the pen strokes as selecting thetext in a plurality of different ways, and, when preparing the summary,to take into account the way in which the selected text is selected. 14.A system according to claim 12 or claim 13 wherein the processor isarranged to prepare the summary on the basis of a weighting which itdefines for different parts in the original document, and the weightingof the selected text is modified in response to its having beenselected.
 15. A system according to any foregoing claim wherein the setincludes a plurality of original documents and the reference documentrefers to each of the original documents.
 16. A system according to anyforegoing claim wherein the processor is arranged to search for otherdocuments using a search strategy determined by the selected content,and to include the other documents in the set.
 17. A system according toclaim 16 wherein the reference document identifies the documents in theset.
 18. A system according to claim 16 or claim 17 wherein thereference document includes an indication of the relevance of at leastone of the documents in the set.
 19. A system according to claim 18wherein the processor is arranged to determine the relevance on thebasis of the selected content.
 20. A system according to claim 13 orclaim 14 wherein the reference document includes at least one link toeach document in the set.
 21. A system according to any foregoing claimarranged to identify a plurality of users, and to produce one referencedocument for each of the users, using pen strokes made by the respectiveuser.
 22. A system according to claim 21 arranged to identify each useron the basis of the identity of the pen that made the pen strokes.
 23. Asystem according to any foregoing claim wherein the reference documentis an electronic document.
 24. A system according to any of claims 1 to22 wherein the original document is an electronic document.
 25. A systemfor assisting a user in the extraction of information from a documentset including at least one original document having content, the systemcomprising: a recording system arranged to receive data defining theposition of pen stokes made on a representation of the document by apen, and a processor arranged to interpret the pen strokes asidentifying selected parts of the content and to produce a referencedocument relating to the document set, the content of the referencedocument being dependent on the selected content.
 26. A system forassisting a user in extracting information from a document set includingat least one original document having content, the system comprising: apen arranged to be moved over a representation of the original documentto define pen strokes, a position determining means arranged todetermine the position of the pen strokes on the representation, andprocessing means arranged to interpret the pen strokes as identifyingselected parts of the content and to produce a reference documentrelating to the document set, the content of the reference documentbeing dependent on the selected content.
 27. A system for assisting auser in extracting information from an original document having content,the system comprising a manually operable selecting device, operable inconjunction with a representation of the original document to selectportions of the content, and a processor arranged to produce a referencedocument relating to the original document, the content of the referencedocument being dependent on the selected portions.
 28. A systemaccording to claim 27 wherein the content includes text, and theselecting device is arranged to select, from the text, at least one of aword, a phrase, a sentence and a paragraph.
 29. A method of extractinginformation from a document set including at least one original documenthaving content, the method comprising: making pen strokes with a pen ona representation of the document, recording the position of the penstrokes on the representation, interpreting the pen strokes using aprocessing means as identifying selected parts of the content on thedocument and producing using the processing means a reference documentrelating to the document set, the content of the reference documentbeing dependent on the selected content.
 30. A data carrier carryingdata arranged to control a computer system to operate as a systemaccording to any of claims 1 to 28 or to carry out the method of claim29.